Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding py::smart_holder (for smart-pointer interoperability). #2672

Merged
merged 206 commits into from
Feb 24, 2021

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Nov 17, 2020

Background:

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 17, 2020

@eacousineau QQ: did you consider building a custom holder type (using PYBIND11_DECLARE_HOLDER_TYPE) as a way to enable passing std::unique_ptr<T>?

Wondering, could this work:

  • custom holder is e.g. disownable_shared<T>
  • plus caster to/from std::unique_ptr<T>
  • plus caster to/from std::shared_ptr<T>

?

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 18, 2020

Quick update, I tested with this change by @rhaschke (rebased #2046):

master...rhaschke:rvalue-argument-fix2

With this change and a small tweak (expecting TypeError instead of ValueError) test_unique_ptr_member.py runs successfully!

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 19, 2020

To track progress working with the rebased #2046 by @rhaschke (master...rhaschke:rvalue-argument-fix2):

  • test_unique_ptr_member also works with std::shared_ptr<pointee> as holder.

  • After reverting Robert's change to common.h (diff below), all but two pytest scripts run successfully (using current upstream master):

    Before reverting (using TypeError): 20 failures in 9 test_*.py files
    After reverting (using RuntimeError): 9 failures in 2 test_*.py files

    Using RuntimeError, there are:

    • 7 failures in test_factory_constructors.py
    • 2 failures in test_pickling.py
  • When running the Google-global testing with Robert's patch included (and the two pytest scripts with failures disabled) there were no issues related to the change! It looks like we just need to fix up the two pytest scripts.

    The Google-global testing covers at least ~270 pybind11 extensions, with millions of indirect dependents.

diff --git a/include/pybind11/detail/common.h b/include/pybind11/detail/common.h
index af871ca..56385f5 100644
--- a/include/pybind11/detail/common.h
+++ b/include/pybind11/detail/common.h
@@ -719,8 +719,8 @@ PYBIND11_RUNTIME_EXCEPTION(value_error, PyExc_ValueError)
 PYBIND11_RUNTIME_EXCEPTION(type_error, PyExc_TypeError)
 PYBIND11_RUNTIME_EXCEPTION(buffer_error, PyExc_BufferError)
 PYBIND11_RUNTIME_EXCEPTION(import_error, PyExc_ImportError)
-PYBIND11_RUNTIME_EXCEPTION(cast_error, PyExc_TypeError) /// Thrown when pybind11::cast or handle::call fail due to a type casting error
-PYBIND11_RUNTIME_EXCEPTION(reference_cast_error, PyExc_TypeError) /// Used internally
+PYBIND11_RUNTIME_EXCEPTION(cast_error, PyExc_RuntimeError) /// Thrown when pybind11::cast or handle::call fail due to a type casting error
+PYBIND11_RUNTIME_EXCEPTION(reference_cast_error, PyExc_RuntimeError) /// Used internally
 
 [[noreturn]] PYBIND11_NOINLINE inline void pybind11_fail(const char *reason) { throw std::runtime_error(reason); }
 [[noreturn]] PYBIND11_NOINLINE inline void pybind11_fail(const std::string &reason) { throw std::runtime_error(reason); }

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 20, 2020

Current state:

upstream master

  • does not build (but builds if the ptr_owner init (passing unique_ptr) is commented out).

#2046

  • closest to desired behavior, but returning unique_ptr or shared_ptr deletes the pointee prematurely (the returned object is invalidated).

https://github.com/RobotLocomotion/pybind11

  • does not invalidate the disowned Python object (potentially dangling pointer).
  • passing unique_ptr not possible with shared_ptr holder.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 20, 2020

I update this PR with highly experimental vptr_holder code (currently needs C++17, for std::variant). The idea is to cleanly model C++ behavior: a unique_ptr can be promoted to a shared_ptr but not the other way around. I don't know if such a behavior can be achieved with class_. I fear the existing shared_ptr handling may be in the way, but I have to find out. I may try to go back to the idea of building an alternative, maybe named vclass, using the vptr_holder, with the desired caster behavior around vclass.

Before I worked on the vptr_holder I ran into trouble with unique_ptr and shared_ptr returns, using @rhaschke's #2046 code (master...rhaschke:rvalue-argument-fix2). On return, the pointee smart pointer is reset prematurely. I developed doubts if the clean C++-like behavior can be shoe-horned into class_, because it would need to have a shared_ptr but also remember somehow if it's actually only a unique_ptr. That thought led me to the std::variant idea.

@rhaschke
Copy link
Contributor

To be honest, I think that you pursue the wrong approach to realize the automatic conversion of holder types. The std::variant based holder is limited to two holder types: unique and shared ptr, while pybind11, in principle, supports any holder type. However, I agree that these holder types already cover most standard use cases.

Your approach essentially works around a major design flaw of pybind11, namely that it doesn't check for compatibility of holder types when passing them between python and C++: a holder type is simply reinterpret-casted to another one. #2644 (which nobody looked at yet) is an attempt to fix this issue, rejecting attempts to use incompatible holder types.

However, you are striving for implicit holder conversion, the 4th and most complicated goal in my summary #2646. I think, if you want to tackle implicit holder conversion, you should mimic the existing code for implicit type conversions, namely implementing a runtime lookup to find the correct conversion mechanism.

As outlined in #2646, I envision an explicit mechanism to handle holder type conversions.

auto u = std::get_if<0>(&vptr_);
if (u) {
auto result = std::shared_ptr<T>(std::move(*u));
vptr_ = result;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwgk: what happens if the unique_ptr was disowned before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1f4f5e2 adds test_promotion_of_disowned_to_shared, which works as desired.

rwgk pushed a commit to google/clif that referenced this pull request Dec 14, 2020
…erop_test.

Main motivation for adding this test: inform work related to
pybind/pybind11#2672

Includes a demonstration related to b/175568410 (marked with comments).

PiperOrigin-RevId: 347386087
explicit vptr(std::shared_ptr<T> s) : vptr_{s} { std::cout << std::endl << "explicit vptr(std::shared_ptr<T> s)" << std::endl; }

int ownership_type() const {
if (std::get_if<0>(&vptr_)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return vptr_.index()?

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 19, 2020

Pure analysis of the current inner workings of pybind11.

@rhaschke, @EricCousineau-TRI, @YannickJadoul: please let me know corrections.

test_holder_unique_ptr & test_holder_shared_ptr were added to systematically exercise returning and passing unique_ptr<T>, shared_ptr<T> with unique_ptr, shared_ptr holder.

Observations:

test_holder_unique_ptr:
  make_unique_pointee  OK
  pass_unique_pointee  BUILD_FAIL (as documented)
  make_shared_pointee  Abort free(): double free detected
  pass_shared_pointee  RuntimeError: Unable to load a custom holder type from a default-holder instance

test_holder_shared_ptr:
  make_unique_pointee  Segmentation fault (#1138)
  pass_unique_pointee  BUILD_FAIL (as documented)
  make_shared_pointee  OK
  pass_shared_pointee  OK

test_smart_ptr_base_derived was added to systematically exercise casts between shared_ptr<base> (short s<B>) and shared_ptr<derived> (s<D>). Concrete and polymorphic (virtual) types are handled slightly differently, the polymorphic case is most general:

When converting a s<B> C++ RETURN value with a virtual B:

  • polymorphic_type_hook uses typeid(B) which will yield the std::type_info RTTI for D if B was upcast previously in C++:
    type = src ? &typeid(*src) : nullptr;
  • the RTTI is used to lookup the Python type for D, essentially by typeid().name():
    if (const auto *tpi = get_type_info(*instance_type))
  • a new D instance is created, using an explicit cast from s<B> to s<D>:
    init_holder(inst, v_h, (const holder_type *) holder_ptr, v_h.value_ptr<type>());

This means the s<B> C++ return value appears as a D instance in Python.

When passing a Python D instance to a C++ function with a s<B> ARGUMENT:

For the case of a unique_ptr return value with a shared_ptr holder, the explicit cast from s<B> to s<D> above becomes an explicit cast from u<B> to s<D>, typically leading to a segfault (#1138), although it is a classical case of UB (Undefined Behavior).

For the case of a shared_ptr return value with a unique_ptr holder, the explicit cast is from s<B> to u<D>, which is also UB and typically leads to a double free. (Probably, it doesn't crash earlier because the raw pointers inside s<B> and u<D> are at the same relative memory locations, as suggested earlier by @rhaschke (not sure where).)

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 20, 2020

CORRECTION 2022-05-02: rwgk/rwgk_tbx@7b63b9f — adding py::multiple_inheritance() — fixes the test. The analysis below was incorrect/incomplete. — Sorry I didn't know, and nobody pointed it out, until I finally noticed myself 1.5 years later.


e0207d6 adds a minimal demonstration of UB in the handling of shared_ptr holder. The test is based on https://godbolt.org/z/4fdjaW by jorgbrown@ (thanks Jorg!). See below for the test result (segfault).

$ clang++ --version
clang version 9.0.1-14
Target: x86_64-pc-linux-gnu

$ /usr/bin/python3 $HOME/clone/pybind11_scons/run_tests.py ../pybind11 -vv test_smart_ptr_private_first_base
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests":
================================================================================= test session starts =================================================================================
platform linux -- Python 3.8.6, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, inifile: pytest.ini
collected 1 item                                                                                                                                                                      

test_smart_ptr_private_first_base.py::test_make_pass FAILED                                                                                                                     [100%]

====================================================================================== FAILURES =======================================================================================
___________________________________________________________________________________ test_make_pass ____________________________________________________________________________________

    def test_make_pass():
        d = m.make_shared_drvd()
        i = m.pass_shared_base(d)
>       assert i == 200
E       assert 0 == 200
E         +0
E         -200

test_smart_ptr_private_first_base.py:9: AssertionError
================================================================================== 1 failed in 0.08s ==================================================================================
Fatal Python error: Segmentation fault

Current thread 0x00007f0a65ada740 (most recent call first):
<no Python frame>

rwgk pushed a commit to rwgk/rwgk_tbx that referenced this pull request Dec 21, 2020
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 21, 2020

I ported test_smart_ptr_private_first_base to Boost.Python: rwgk/rwgk_tbx@debf3a5

It runs without a problem.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 22, 2020

47b81af adds an additional test designed to demonstrate UB in the shared_ptr holder handling, with root cause in the explicit cast here (this link is also in my comment from 4 days ago):

init_holder(inst, v_h, (const holder_type *) holder_ptr, v_h.value_ptr<type>());

The test segfaults in this line: i = m.pass_shared_drvd(b)

The Boost.Python port of this additional test passes without a problem: rwgk/rwgk_tbx@7d7494a

@EricCousineau-TRI
Copy link
Collaborator

EricCousineau-TRI commented Dec 22, 2020

Haven't yet fully paged into all the work you've done here, but awesome!
I saw some work on virtual inheritance and your mention of segfaulting - can I ask if it's related to slicing of derived objects? (#1145 and other similar issues?) -- I had seen your unittests about inheritance, assumed it was resolved, but was then surprised about the segfaulting

FWIW These are the tests in our fork that I had written to address the issue:
https://github.com/RobotLocomotion/pybind11/blob/58a368ea8e89638e01a7385cad9ce4345d70003d/tests/test_ownership_transfer.py

@EricCousineau-TRI
Copy link
Collaborator

EricCousineau-TRI commented Dec 22, 2020

@eacousineau QQ: did you consider building a custom holder type (using PYBIND11_DECLARE_HOLDER_TYPE) as a way to enable passing std::unique_ptr<T>?

Nope, hadn't considered that! Maybe that'd be better; however, it may still require additional expectations on the holder API, at least for my use cases. For example, to address #1145 (the slicing from inheritance), I wanted to add some form of .release() functionality. (Can explain more if you'd like.)


Also, I read the above summary about the current breakdown. That seems about accurate!

I had trouble disentangling this each time, so I ended up basically only supporting single-inheritance when it came down to holder conversion and/or ownership transfer:
https://github.com/RobotLocomotion/pybind11/blob/58a368ea8e89638e01a7385cad9ce4345d70003d/include/pybind11/cast.h#L492-L851

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 23, 2020

Tracking a line of thought (resulting in an UNPROVEN idea):

This was really useful for me to learn about the caster mechanisms:

https://gist.github.com/rwgk/4fe600fad45442a5d790b53195a4da51

It's an annotated/edited gdb stack trace, for the situation reported by @eacousineau under #1138.

You can see in frames #8, #7, #6 the unique_ptr holder type information is available, but in frames #5, #4 it's lost (all you have there is a const void*), then, after a runtime search for the Python type, in frames #3, #2, #1 the holder type is shared_ptr.

In frame #0 the void* is reinterpret_casted to shared_ptr. That's exactly the root cause of the UB reported in previous comments, even if the holder types match. Looking only at that simpler case: what's needed to achieve well-defined behavior is a std::dynamic_pointer_cast. How can that be achieved with the current high-level code organization? In frame #0 we can at best only have RTTI information for the source type, but not compile-time type information.

This lead me to a new idea:

  std::shared_ptr<void> ivp = std::dynamic_pointer_cast<void>(shared_drvd_uc);
  std::shared_ptr<drvd> shared_drvd_dc = std::reinterpret_pointer_cast<drvd>(ivp);

Complete code: https://godbolt.org/z/sPTae3

It runs to completion. I'm asking our C++ experts if this is well-defined.

Edit ~3 hour later: undoing previous edit, we're still figuring this out.

rwgk pushed a commit to rwgk/rwgk_tbx that referenced this pull request Dec 24, 2020
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 24, 2020

8dba00b adds an additional test designed to demonstrate UB in the handling of polymorphic pointers, with root cause in the reinterpret_cast cast here:

return reinterpret_cast<V *&>(vh[0]);

The new test segfaults as shown below.
Note that this new demo does NOT involve smart pointers at all, unlike the otherwise similar test_smart_ptr_private_first_base.

The Boost.Python port of this additional test passes without a problem: rwgk/rwgk_tbx@97919e3

EDIT 2021-01-01: The root problem is that the pybind11 reinterpret_cast does not perform the required "pointer fixups" aka thunks as described under https://en.wikipedia.org/wiki/Virtual_method_table, section Multiple inheritance and thunks. (This is exactly what the Boost.Python convert_type function does: https://github.com/boostorg/python/blob/5e4f6278e01f018c32cebc4ffc3b75a743fb1042/src/object/inheritance.cpp#L390)

EDIT 2021-01-12: pybind11 actually has functionality for handling thunks (

std::vector<std::pair<const std::type_info *, void *(*)(void *)>> implicit_casts;
) but this functionality is missing bases that are not wrapped or private.

================================================================================= test session starts =================================================================================
platform linux -- Python 3.8.6, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, inifile: pytest.ini
collected 2 items / 1 deselected / 1 selected                                                                                                                                         

test_private_first_base.py::test_make_drvd_pass_base FAILED                                                                                                                     [100%]

====================================================================================== FAILURES =======================================================================================
______________________________________________________________________________ test_make_drvd_pass_base _______________________________________________________________________________

    def test_make_drvd_pass_base():
        d = m.make_drvd()
        i = m.pass_base(d)
>       assert i == 200
E       assert 0 == 200
E         +0
E         -200

test_private_first_base.py:10: AssertionError
=========================================================================== 1 failed, 1 deselected in 0.06s ===========================================================================
Fatal Python error: Segmentation fault

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 24, 2020

@EricCousineau-TRI wrote:

I saw some work on virtual inheritance and your mention of segfaulting - can I ask if it's related to slicing of derived objects? (#1145 and other similar issues?)

I looked briefly at #1145 but I'm not clear, in particular what exactly is slicing? But maybe you can see faster if there is a connection, based on the even more reduced test under #2672 (comment)? The real root cause of what I'm observing is that a pointer obtained through a dynamic_cast<void*> can only safely be cast back directly to the most derived type. Does that sound like it's related to what you call "slicing"?

Here is a godbolt demo for the safe vs unsafe casting back: https://godbolt.org/z/8873Wd

(For future reference, that godbolt code is archived here: https://github.com/rwgk/stuff/blob/56bfaddc45330253d5a8b1c8674db10cb9b633ad/godbolt/safe_vs_unsafe_cast_back_from_void_pointer.cpp)

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 30, 2020

This comment is related to #1333, #2757, #2646.
Keyword: inheritance slicing

9eea907 adds a test similar to the reproducer under #1333.
An equivalent Boost.Python test is here:
https://github.com/rwgk/rwgk_tbx/blob/6c9a6d6bc72d5c1b8609724433259c5b47178680/cpp_base_py_derived_ext.cpp
https://github.com/rwgk/rwgk_tbx/blob/6c9a6d6bc72d5c1b8609724433259c5b47178680/tst_cpp_base_py_derived.py

Observations:

  • Boost.Python fails the simpler test_base, apparently because it's missing a to-python converter for std::shared_ptr<base> when the holder is std::shared_ptr<base_callback> (base_callback is the Boost.Python equivalent of pybind's trampoline feature).
  • Boost.Python passes the more involved test_drvd, where drvd is a Python subclass of the wrapped C++ base.
  • Conversely, pybind11 passes test_base but fails test_drvd, see below. Interestingly, the unexpected 203107 return value means that there are two confounded issues: 1. the cloned shared_ptr<base> is not a pointer for the trampoline class, 2. it is a default-constructed base (_base_num 100) when it should be constructed from the clone function (_base_num 150).
  • The new pybind11 test runs ASAN and MSAN clean, and it is reliably reproducible (in other words Not flakey).

Speculative: I believe Boost.Python passes test_drvd because the shared_ptr<base> in clone_get_num has a custom deleter that keeps the wrapper PyObject alive: https://github.com/boostorg/python/blob/5e4f6278e01f018c32cebc4ffc3b75a743fb1042/include/boost/python/converter/shared_ptr_from_python.hpp#L52

platform linux -- Python 3.8.6, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, inifile: pytest.ini
collected 2 items                                                                                                                                                                     

test_cpp_base_py_derived.py::test_base PASSED                                                                                                                                   [ 50%]
test_cpp_base_py_derived.py::test_drvd FAILED                                                                                                                                   [100%]

====================================================================================== FAILURES =======================================================================================
______________________________________________________________________________________ test_drvd ______________________________________________________________________________________

    def test_drvd():
      d = drvd()
      assert d.get_num() == 200
      assert m.get_num(d) == 200
      dc = d.clone()
      assert dc.get_num() == 250
>     assert m.clone_get_num(d) == 203257
E     assert 203107 == 203257
E       +203107
E       -203257

test_cpp_base_py_derived.py:35: AssertionError
============================================================================= 1 failed, 1 passed in 0.05s =============================================================================

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Feb 24, 2021
…h (fixes MSVC and GCC failures after iwyu cleanup under PR pybind#2672).
@rwgk rwgk changed the base branch from master to smart_holder February 24, 2021 05:49
@rwgk rwgk merged commit 1bafd5d into pybind:smart_holder Feb 24, 2021
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Feb 24, 2021
@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Feb 24, 2021
@rwgk rwgk deleted the test_unique_ptr_member branch February 24, 2021 06:10
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Mar 8, 2021
@EricCousineau-TRI EricCousineau-TRI added the smart holder See: https://github.com/pybind/pybind11/blob/smart_holder/README_smart_holder.rst label Apr 26, 2021
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request May 1, 2021
…h (fixes MSVC and GCC failures after iwyu cleanup under PR pybind#2672).
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Jul 11, 2021
…h (fixes MSVC and GCC failures after iwyu cleanup under PR pybind#2672).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
smart holder See: https://github.com/pybind/pybind11/blob/smart_holder/README_smart_holder.rst
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants